using StatsBase, DataFrames, DataFrameMacros
using RCall, CategoricalArrays, TexTables
using Distributions, Gadfly, Compose, MLJ
using HypothesisTests, CSV, StableRNGs
using AnovaGLM, Effects, MultipleTesting
using MLJ: schemaContinuous Outcomes
2 Confidence Intervals around the Mean
The data set we will be using for this laboratory is from Bernard, GR, et al. (1997) The effects of ibuprofen on the physiology and survival of patients with sepsis, N Engl J Med 336(13): 912–918. Here is an abbreviated version of the abstract.
“\(\dots\) we conducted a randomized, double-blind, placebo-controlled trial of intravenous ibuprofen\(\dots\) in 455 patients who had sepsis, defined as fever, tachycardia, tachypnea, and acute failure of at least one organ system. In the ibuprofen group, but not the placebo group, there were significant declines in [various measures including] temperature\(\dots\) However, treatment with ibuprofen did not reduce the incidence or duration of shock or the acute respiratory distress syndrome and did not significantly improve the rate of survival at 30 days (mortality, 37 percent with ibuprofen vs. 40 percent with placebo).”
bernard = rcopy(R"pubh::Bernard")
bernard |> schema┌──────────┬────────────────────────────┬──────────────────────────────────┐
│ names │ scitypes │ types │
├──────────┼────────────────────────────┼──────────────────────────────────┤
│ id │ Continuous │ Float64 │
│ treat │ Multiclass{2} │ CategoricalValue{String, UInt32} │
│ race │ Multiclass{3} │ CategoricalValue{String, UInt32} │
│ fate │ Multiclass{2} │ CategoricalValue{String, UInt32} │
│ apache │ Union{Missing, Continuous} │ Union{Missing, Float64} │
│ o2del │ Union{Missing, Continuous} │ Union{Missing, Float64} │
│ followup │ Continuous │ Float64 │
│ temp0 │ Continuous │ Float64 │
│ temp10 │ Union{Missing, Continuous} │ Union{Missing, Float64} │
└──────────┴────────────────────────────┴──────────────────────────────────┘
Let’s take a look at the distribution of baseline temperature.
qq_plot(bernard.temp0; ylab="Baseline Temperature (°C)")Let’s assume normality and estimate the 95% CI around the mean baseline temperature for all patients.
cis(bernard.temp0)(outcome = 38.0149207202506,
lower = 37.91149233237877,
upper = 38.11834910812243,)
What about the 95% CI around the mean temperature after 36 hr of treatment?
cis(@subset(bernard, !ismissing(:temp10)).temp10)(outcome = 37.31448060654226,
lower = 37.21970170268702,
upper = 37.409259510397504,)
We included the argument !ismissing to remove missing values from the calulation, else the result would be missing.
We can estimate bootstrap CI via bst from R package bst.
pubh.bst(bernard.temp10) |> rcopy| Row | stat | estimate | %CI | lower | upper |
|---|---|---|---|---|---|
| String | Float64 | Float64 | Float64 | Float64 | |
| 1 | mean | 37.31 | 95.0 | 37.22 | 37.4 |
2.1 Tests for Means
2.1.1 One-sample \(t\)-tests
Perform the following two-sided one-sample \(t\)-test, where the normal core temperature is 37\(^{\circ}C\). Note that because sepsis was diagnosed in this case by a set of symptoms including fever, you would be very surprised if there were no evidence of a difference between the mean baseline temperature of sepsis patients and the normal body temperature.
If we define \(\bar x\) as the mean baseline temperature, our two hypotheses are:
- \(H_0: \bar x = 37^{\circ}C\)
- \(H_A: \bar x \neq 37^{\circ}C\)
Take a look at the help file of t.test to get familiarised with its options. By default, we are using a two-sided test, with a significant \(\alpha=0.05\) (95% CI).
OneSampleTTest(bernard.temp0, 37)One sample t-test
-----------------
Population details:
parameter of interest: Mean
value under h_0: 37
point estimate: 38.0149
95% confidence interval: (37.91, 38.12)
Test summary:
outcome with 95% confidence: reject h_0
two-sided p-value: <1e-60
Details:
number of observations: 455
t-statistic: 19.23306214688628
degrees of freedom: 454
empirical standard error: 0.05276958564889316
We are making a one-sample test, comparing the mean baseline temperature, against the mean reference value of \(\mu\) = 37 \(\degree\) C. Because the test is two-sided, if our mean value is significantly greater or significantly less than \(\mu\) = 37 \(\degree\) C we reject the null hypothesis. The probability of observing a mean baseline temperature of \(\bar x\) = 37 \(\degree\) C in our sample is \(p\) < 0.001. The mean baseline temperature in our sample was \(\bar x\) = 38.09 \(\degree\) C (95% CI: 37.93 \(\degree\) C, 38.25 \(\degree\) C).
There are deviations from normality in baseline temperature. Lower temperatures are particularly very unlikely to come from a normal distribution.
Our sample is large enough to not be worried about small deviations from normality. In healthy subjects, the temperature would be expected to be centred, and normally distributed.
2.2 Paired \(t\)-tests
Assume we want to know if there was a significant decrease in the mean temperature at 36 hours in the Placebo group. The \(t\)-test assumes that data is independent. In this example, the same subjects were measured twice: at baseline and 36 hours. This is a classic example of a paired analysis.
placebo = @select(
@subset(bernard, :treat == "Placebo"),
:temp0, :temp10
)
dropmissing!(placebo);OneSampleTTest(placebo.temp10, placebo.temp0)One sample t-test
-----------------
Population details:
parameter of interest: Mean
value under h_0: 0
point estimate: -0.498563
95% confidence interval: (-0.6428, -0.3543)
Test summary:
outcome with 95% confidence: reject h_0
two-sided p-value: <1e-09
Details:
number of observations: 201
t-statistic: -6.816054633057416
degrees of freedom: 200
empirical standard error: 0.07314536404958778
The mean decrease in temperature from baseline to 36 hr in the placebo group was 0.45 \(\degree\) C (95% CI: 0.18 \(\degree\) C, 0.71 \(\degree\) C). There was a significant placebo effect (\(p=0.001\)) as the 95% CI for the temperature change in the placebo group did not include the null value of zero.
2.3 Two-sample \(t\)-tests
Our real question of interest is to test if given Ibuprofen was statistically different from given placebo in patients with sepsis. This is a two-sided, two-sample hypothesis. The two samples are independent (treatment groups), and our variable of interest is temp_change.
First, we calculate the difference in temperatures.
bern = @select(bernard, :temp0, :temp10, :treat)
@transform!(bern, :temp_change = :temp10 - :temp0)
dropmissing!(bern);One of the assumptions is that the distribution of temp_change is normal for each group. The another big assumption is that the variance is the same. To compare variances, we perform a variance test. The null hypothesis is that the ratio of the two variances is equal to one (same variance) and the alternative is that is different from one. A \(p \leq 0.05\) means that there is no statistical difference between the two variances and, therefore, that the assumption of homogeneity of variances holds.
First, we perform a standard descriptive analysis on temp_change.
summarize_by(bern, :treat, :temp_change) | | Obs | Mean | Std. Dev. | Min | Max
--------------------------------------------------------------------
Placebo | temp_change | 201 | -0.499 | 1.037 | -3.278 | 2.900
--------------------------------------------------------------------
Ibuprofen | temp_change | 198 | -1.034 | 1.133 | -4.100 | 2.500
The R package pubh has function estat that allow us to also generate a table of descriptive statistics.
pubh.estat(@formula(temp_change ~ treat), data=bern) |> rcopy| Row | treat | N | Min | Max | Mean | Median | SD | CV | |
|---|---|---|---|---|---|---|---|---|---|
| String | String | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
| 1 | temp_change | Placebo | 201.0 | -3.28 | 2.9 | -0.5 | -0.5 | 1.04 | -2.08 |
| 2 | Ibuprofen | 198.0 | -4.1 | 2.5 | -1.03 | -1.11 | 1.13 | -1.1 |
Construct a QQ-plot of temp_change from subjects by treatment group, against the standard normal distribution to check for the normality assumption.
Code
qq_plot(bern.temp_change, ylab = "Temperature Change (°C)")We perform a variance test with VarianceFTest.
VarianceFTest(
@subset(bern, :treat == "Ibuprofen").temp_change,
@subset(bern, :treat == "Placebo").temp_change
) |> pvalue |> r30.212
HypothesisTests is not designed to work with data frames, hence, one need to provides each vector. As an alternative to the previous code, we can generate these vectors with the function vec_group. The arguments of vec_group are the data frame, the outcome (continuous) variable and the group (categorical) variable.
VarianceFTest(
vec_group(bern, :temp_change, :treat)...
) |> pvalue |> r30.212
Now, let’s test the null hypothesis that the mean temperature change between the two groups is the same.
EqualVarianceTTest(
vec_group(bern, :temp_change, :treat)...
)Two sample t-test (equal variance)
----------------------------------
Population details:
parameter of interest: Mean difference
value under h_0: 0
point estimate: 0.535107
95% confidence interval: (0.3214, 0.7489)
Test summary:
outcome with 95% confidence: reject h_0
two-sided p-value: <1e-05
Details:
number of observations: [201,198]
t-statistic: 4.9216804918824
degrees of freedom: 397
empirical standard error: 0.10872455012044961
3 Non Parametric tests
3.1 Mann-Whitney
In some disciplines, researchers are not interested in the magnitude of the difference, e.g., when there is no precise knowledge of the interpretation of the scales. Under those circumstances, they may choose a priori, to use non-parametric tests for relatively small samples.
Non-parametric tests are also used when violations to the \(t\)-test assumptions occur.
We never, ever perform both a parametric and a non-parametric test. That decision has to be taken a priori, given our assumptions. When we perform both tests, we may fall into the temptation to report the more beneficial results; in other words, by performing both tests, we introduce bias in our analysis.
We will compare energy expenditure between lean and obese woman.
energy = rcopy(R"ISwR::energy")
energy |> schema┌─────────┬───────────────┬──────────────────────────────────┐
│ names │ scitypes │ types │
├─────────┼───────────────┼──────────────────────────────────┤
│ expend │ Continuous │ Float64 │
│ stature │ Multiclass{2} │ CategoricalValue{String, UInt32} │
└─────────┴───────────────┴──────────────────────────────────┘
Calculate descriptive statistics for variable expend by stature from the energy dataset.
Code
pubh.estat(@formula(expend ~ stature), data=energy) |> rcopy| Row | stature | N | Min | Max | Mean | Median | SD | CV | |
|---|---|---|---|---|---|---|---|---|---|
| String | String | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
| 1 | expend | lean | 13.0 | 6.13 | 10.88 | 8.07 | 7.9 | 1.24 | 0.15 |
| 2 | obese | 9.0 | 8.79 | 12.79 | 10.3 | 9.69 | 1.4 | 0.14 |
What are your general observations from the descriptive analysis?
On average, obese women have more energy expenditure than lean woman, but we do not know if that difference is significant. Both groups of women show the same variance on energy expenditure and show a positive skew on the distribution (the medians are relatively lower than the means).
Given that our samples are relatively small (less than 30 observations per group), the best way to graphically compare distributions is by strip charts.
Construct a strip chart comparing the energy expenditure by stature.
Code
strip_error(
energy,
:stature, :expend,
xlab = "Stature",
ylab = "Energy expenditure (MJ)"
)We can check graphically for normality. Strictly speaking, the mean difference is the one that has to be normally distributed, for simplicity, we will look at the distribution of energy for each group, as that is a good indicator about normality on the mean difference.
let
energy_qq = DataFrames.combine(
groupby(energy, :stature),
:expend => (x->fit(Normal, x)) => :d
)
plot(
energy_qq,
x=:d, y=energy.expend, color=:stature,
Stat.qq,
Guide.xlabel("Normal quantiles"),
Guide.ylabel("Energy expenditure (MJ)")
)
endWhat about variance equality?
VarianceFTest(
vec_group(energy, :expend, :stature)...
) |> pvalue |> r30.68
The associated non-parametric test to the t-test is the Wilcoxon-Mann-Whitney test, more commonly known as Mann-Whitney test.
MannWhitneyUTest(
vec_group(energy, :expend, :stature)...
) |> pvalue |> r30.002
3.1.1 Paired data
Paired tests are used when there are two measurements on the same experimental unit. We will use data on pre- and post-menstrual energy intake in a group of 11 women.
intake = rcopy(R"ISwR::intake")
intake |> schema┌───────┬────────────┬─────────┐ │ names │ scitypes │ types │ ├───────┼────────────┼─────────┤ │ pre │ Continuous │ Float64 │ │ post │ Continuous │ Float64 │ └───────┴────────────┴─────────┘
We can start, as usual, with descriptive statistics.
estat(intake)| Row | Variable | n | Median | Mean | SD | CV |
|---|---|---|---|---|---|---|
| Symbol | Int64 | Float64 | Float64 | Float64 | Float64 | |
| 1 | pre | 11 | 6515.0 | 6753.64 | 1142.12 | 0.169 |
| 2 | post | 11 | 5265.0 | 5433.18 | 1216.83 | 0.224 |
Let’s work on the assumption that we are not interested in the magnitude of the difference but only if that difference is significant or not. On those circumstances and given the small sample size, we would perform a non-parametric test that would be equivalent to the paired \(t\)-test.
ApproximateSignedRankTest(
intake.post,
intake.pre
) |> pvalue |> r30.004
We are going to use an example from Altman on the number of CD4 \(^+\) T cells and CD8 \(^+\) T cells in patients with Hodgkin’s disease or with disseminated malignancies (the Non-Hodgkin’s disease group).
hodgkin = rcopy(R"pubh::Hodgkin")
hodgkin |> schema┌───────┬───────────────┬──────────────────────────────────┐
│ names │ scitypes │ types │
├───────┼───────────────┼──────────────────────────────────┤
│ CD4 │ Count │ Int64 │
│ CD8 │ Count │ Int64 │
│ Group │ Multiclass{2} │ CategoricalValue{String, UInt32} │
└───────┴───────────────┴──────────────────────────────────┘
Generate a new variable, named ratio that will contain the ratio of CD4 \(^+\) / CD8 \(^+\) T cells.
Code
@transform!(
hodgkin,
:ratio = :CD4 / :CD8
);Generate a table with descriptive statistics for ratio, stratified by Group.
Code
pubh.estat(@formula(ratio ~ Group), data=hodgkin) |> rcopy| Row | Group | N | Min | Max | Mean | Median | SD | CV | |
|---|---|---|---|---|---|---|---|---|---|
| String | String | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
| 1 | ratio | Non-Hodgkin | 20.0 | 1.1 | 3.49 | 2.12 | 2.15 | 0.73 | 0.34 |
| 2 | Hodgkin | 20.0 | 0.47 | 3.82 | 1.5 | 1.19 | 0.91 | 0.61 |
Let’s compare the distributions of the ratios with a QQ-Plot:
let
hodg = DataFrames.combine(
groupby(hodgkin, :Group),
:ratio=>(x->fit(Normal, x))=>:d
)
plot(
hodg,
x=:d, y=hodgkin.ratio, color=:Group,
Stat.qq,
Guide.xlabel("Theoretical quantiles"),
Guide.ylabel("CD4 / CD8"))
endI know that for the normal, healthy population about 60 % of their T-cells is CD4 \(^+\) and about 40 % CD8 \(^+\) , i.e., a Ratio = 1.5. Given this, I know that the population who is showing abnormal levels is the group of non-Hodgkin’s lymphoma (see descriptive analysis). I would not be interested in knowing the confidence intervals of that difference.
Given that:
- The sample size is relatively small.
- The distribution of CD4 \(^+\) / CD8 \(^+\) T cells is not the same in the two groups.
- Small changes (regardless of magnitude) in the distribution of T cell populations have significant biological consequences.
I would perform a non-parametric test. Once I know that this difference is statistically significant (i.e., very unlikely due to chance), I would conduct further studies to find out more about what is happening at a cellular and molecular level.
Would it be wrong to make a parametric test? Not at all, as long as the rationale and assumptions are clear. What is wrong it to perform both tests. We are not going to do that and perform only the Mann-Whitney test.
MannWhitneyUTest(
vec_group(hodgkin, :ratio, :Group)...
) |> pvalue |> r30.007
4 ANOVA
We will use a dataset on infant birth weight (in kg) and the smoking status of their mothers at the end of the first trimester.
smokew = read_rds("data/smokew.rds") |> rcopy
smokew |> schema┌─────────┬───────────────┬──────────────────────────────────┐
│ names │ scitypes │ types │
├─────────┼───────────────┼──────────────────────────────────┤
│ bweight │ Continuous │ Float64 │
│ smoking │ Multiclass{4} │ CategoricalValue{String, UInt32} │
└─────────┴───────────────┴──────────────────────────────────┘
We can start with descriptive statistics.
pubh.estat(@formula(bweight ~ smoking), data=smokew) |> rcopy| Row | smoking | N | Min | Max | Mean | Median | SD | CV | |
|---|---|---|---|---|---|---|---|---|---|
| String | String | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
| 1 | bweight | Non-smoker | 7.0 | 2.82 | 4.18 | 3.45 | 3.41 | 0.44 | 0.13 |
| 2 | Ex-smoker | 5.0 | 2.64 | 3.73 | 3.29 | 3.32 | 0.41 | 0.13 | |
| 3 | Light-smoker | 7.0 | 2.14 | 3.77 | 2.88 | 2.82 | 0.52 | 0.18 | |
| 4 | Heavy-smoker | 8.0 | 2.23 | 3.23 | 2.73 | 2.73 | 0.33 | 0.12 |
Given the number of observations per group, we use a strip plot to compare the four groups graphically.
smokew_bst = pubh.gen_bst_df(
@formula(bweight ~ smoking),
data=smokew
) |> rcopy| Row | bweight | LowerCI | UpperCI | smoking |
|---|---|---|---|---|
| Float64 | Float64 | Float64 | Cat… | |
| 1 | 3.45 | 3.18 | 3.74 | Non-smoker |
| 2 | 3.29 | 2.94 | 3.58 | Ex-smoker |
| 3 | 2.88 | 2.56 | 3.22 | Light-smoker |
| 4 | 2.73 | 2.52 | 2.96 | Heavy-smoker |
strip_error(
smokew,
:smoking, :bweight,
xlab = "Smoking status",
ylab = "Birth weight (kg)"
)4.1 Initial assumptions
Normality can be tested using the Shapiro-Wilks test. The null hypothesis is that the distribution of the error is normal. We could look at the distribution of bweight for each group with QQ-plots. We will check for normality after fitting the model.
Homoscedasticity (homogeneity of variances) can be tested with Bartlett’s or Levene’s test of variance. The null hypothesis is that the variances are equal (homogeneous).
LeveneTest(
vec_group(smokew, :bweight, :smoking)...
) |> pvalue |> r30.88
4.2 Model
We will make ANOVA after first fitting a linear model with lm:
model_smoke = lm(@formula(bweight ~ smoking), smokew)
anova(model_smoke)Analysis of Variance
Type 1 test / F test
bweight ~ 1 + smoking
Table:
────────────────────────────────────────────────────────────
DOF Exp.SS Mean Square F value Pr(>|F|)
────────────────────────────────────────────────────────────
(Intercept) 1 252.76 252.76 1392.6307 <1e-21
smoking 3 2.4085 0.8028 4.4234 0.0135
(Residuals) 23 4.1744 0.1815
────────────────────────────────────────────────────────────
Not all groups of babies have the same mean birth weight. At least one of them is statistically different to another (\(p=0.014\)). From the descriptive statistics, we know that the cohort of babies born from non-smoker mothers have a mean birth weight significantly higher than those born from heavy-smoker mothers.
4.3 Post-hoc tests
So far, we know that there is evidence that at least the cohort of babies born from non-smoker mothers has a mean birth weight higher than those born from heavy-smoker mothers, but we do not know about any other comparison.
If we perform all possible paired t-test between the groups we would be increasing our error. To avoid that, we adjust our confidence intervals and p-values for multiple comparisons. There are several methods for making the adjustment.
We can use the function empairs to do the pairwise comparison and then adjust corresponding p-values with functions from MultipleTesting.
BH_adj(pvals) = MultipleTesting.adjust(PValues(pvals), BenjaminiHochberg());empairs(model_smoke; dof=dof_residual, padjust=BH_adj)| Row | smoking | bweight | err | dof | t | Pr(>|t|) |
|---|---|---|---|---|---|---|
| String | Float64 | Float64 | Float64 | Float64 | Float64 | |
| 1 | Non-smoker > Ex-smoker | 0.153143 | 0.249453 | 23.0 | 0.613914 | 0.545297 |
| 2 | Non-smoker > Light-smoker | 0.57 | 0.227719 | 23.0 | 2.50309 | 0.0595751 |
| 3 | Non-smoker > Heavy-smoker | 0.713393 | 0.220488 | 23.0 | 3.23552 | 0.0219269 |
| 4 | Ex-smoker > Light-smoker | 0.416857 | 0.249453 | 23.0 | 1.67108 | 0.162388 |
| 5 | Ex-smoker > Heavy-smoker | 0.56025 | 0.24287 | 23.0 | 2.30679 | 0.0608319 |
| 6 | Light-smoker > Heavy-smoker | 0.143393 | 0.220488 | 23.0 | 0.650344 | 0.545297 |
We compared the birth weights of babies born from mothers of four different smoking status: non-smokers, ex-smokers, light-smokers and heavy-smokers with one-way ANOVA. We obtained a significant result (p = 0.014) that merited a post-hoc analysis. For the post-hoc analysis, we adjusted p-values for multiple comparisons by the method of Benjamini-Hochberg. After adjusting for multiple comparisons, the only statistical difference found was between babies born from non-smoker mothers and babies born from heavy-smoker mothers (p = 0.022). On average, babies born from non-smoker mothers had a birth weight of 0.71 kg higher than babies born from heavy-smoker mothers.
4.4 Diagnostics
smoke_perf = model_perf(model_smoke);Normality:
resid_plot(smoke_perf)Variance:
rvf_plot(smoke_perf)Mean absolute error:
mean(abs.(smoke_perf.error)) |> r30.303
Mean absolute percentage error:
mape(smoke_perf) |> r30.004
Root mean square error:
rmse(smoke_perf) |> r30.393
4.4.1 Effects
In treatment coding of categorical variables (the default), the hypothesis for the coefficients for each level is against the reference level.
model_smokeStatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}
bweight ~ 1 + smoking
Coefficients:
────────────────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────────────────────────
(Intercept) 3.44714 0.161021 21.41 <1e-15 3.11404 3.78024
smoking: Ex-smoker -0.153143 0.249453 -0.61 0.5453 -0.669176 0.362891
smoking: Light-smoker -0.57 0.227719 -2.50 0.0199 -1.04107 -0.0989279
smoking: Heavy-smoker -0.713393 0.220488 -3.24 0.0037 -1.16951 -0.257279
────────────────────────────────────────────────────────────────────────────────────
In effects coding of categorical variables, the hypothesis for the coefficients for each level is against the mean across all levels.
model_eff = lm(
@formula(bweight ~ smoking), smokew;
contrasts=Dict(:smoke => EffectsCoding())
)StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}
bweight ~ 1 + smoking
Coefficients:
────────────────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────────────────────────
(Intercept) 3.44714 0.161021 21.41 <1e-15 3.11404 3.78024
smoking: Ex-smoker -0.153143 0.249453 -0.61 0.5453 -0.669176 0.362891
smoking: Light-smoker -0.57 0.227719 -2.50 0.0199 -1.04107 -0.0989279
smoking: Heavy-smoker -0.713393 0.220488 -3.24 0.0037 -1.16951 -0.257279
────────────────────────────────────────────────────────────────────────────────────
To look at the effects, we first generate a reference grid:
model_des = Dict(:smoking => unique(smokew.smoking))
smoke_eff = effects(model_des, model_eff)| Row | smoking | bweight | err | lower | upper |
|---|---|---|---|---|---|
| String | Float64 | Float64 | Float64 | Float64 | |
| 1 | Non-smoker | 3.44714 | 0.161021 | 3.28612 | 3.60816 |
| 2 | Ex-smoker | 3.294 | 0.190523 | 3.10348 | 3.48452 |
| 3 | Light-smoker | 2.87714 | 0.161021 | 2.71612 | 3.03816 |
| 4 | Heavy-smoker | 2.73375 | 0.150622 | 2.58313 | 2.88437 |
We can use the estimated effects to get a nice visual of the data:
effect_plot(
smoke_eff,
:smoking, :bweight,
xlab = "Smoking status",
ylab = "Birth weight (kg)"
)4.5 Alternatives for non-Normal data
The airquality dataset has daily readings on air quality values for May 1, 1973 (a Tuesday) to September 30, 1973.
air = rcopy(R"datasets::airquality") |> dropmissing
air.Month = recode(
air.Month,
5 => "May",
6 => "Jun",
7 => "Jul",
8 => "Aug",
9 => "Sep"
)
coerce!(air, :Month => Multiclass)
levels!(air.Month, unique(air.Month))
air |> schema┌─────────┬───────────────┬──────────────────────────────────┐
│ names │ scitypes │ types │
├─────────┼───────────────┼──────────────────────────────────┤
│ Ozone │ Count │ Int64 │
│ Solar_R │ Count │ Int64 │
│ Wind │ Continuous │ Float64 │
│ Temp │ Count │ Int64 │
│ Month │ Multiclass{5} │ CategoricalValue{String, UInt32} │
│ Day │ Count │ Int64 │
└─────────┴───────────────┴──────────────────────────────────┘
Calculate descriptive statistics for Ozone by Month.
Code
pubh.estat(@formula(Ozone ~ Month), data=air) |> rcopy| Row | Month | N | Min | Max | Mean | Median | SD | CV | |
|---|---|---|---|---|---|---|---|---|---|
| String | String | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
| 1 | Ozone | May | 24.0 | 1.0 | 115.0 | 24.12 | 18.0 | 22.89 | 0.95 |
| 2 | Jun | 9.0 | 12.0 | 71.0 | 29.44 | 23.0 | 18.21 | 0.62 | |
| 3 | Jul | 26.0 | 7.0 | 135.0 | 59.12 | 60.0 | 31.64 | 0.54 | |
| 4 | Aug | 23.0 | 9.0 | 168.0 | 60.0 | 45.0 | 41.77 | 0.7 | |
| 5 | Sep | 29.0 | 7.0 | 96.0 | 31.45 | 23.0 | 24.14 | 0.77 |
Look at the relative dispersion; the distribution of ozone is clearly not normal. The months of July and August have the highest median concentrations of ozone. We do not know yet, if the month with the highest median ozone concentration (July), is significantly different from the one with the lowest median ozone concentration (May). In June, ozone concentrations were recorded for only nine days.
Use Levene’s test to test for homoscedasticity.
Code
LeveneTest(
vec_group(air, :Ozone, :Month)...
) |> pvalue |> r30.002
4.5.1 Log-transformation
We can log-transform to make the distribution closer to Normal and the variance constant between groups.
We check for normality for each group.
let
@transform!(air, :log_oz = log.(:Ozone))
air_qq = DataFrames.combine(
groupby(air, :Month),
:log_oz => (x -> fit(Normal, x)) => :d
)
plot(
air_qq,
x = :d, y = air.log_oz, color = :Month,
Stat.qq,
Guide.xlabel("Normal quantiles"),
Guide.ylabel("log (Ozone)")
)
endNormality seems to be good enough, though we will check the distribution of the residuals later. What about homoscedasticity?
LeveneTest(
vec_group(air, :log_oz, :Month)...
) |> pvalue |> r30.553
We can proceed to fit an ANOVA model to our data.
model_air = lm(
@formula(log_oz ~ Month), air;
contrasts=Dict(:Month => EffectsCoding())
)
anova(model_air)Analysis of Variance
Type 1 test / F test
log_oz ~ 1 + Month
Table:
───────────────────────────────────────────────────────────
DOF Exp.SS Mean Square F value Pr(>|F|)
───────────────────────────────────────────────────────────
(Intercept) 1 1295.21 1295.21 2170.4984 <1e-71
Month 4 19.22 4.8040 8.0506 <1e-04
(Residuals) 106 63.25 0.5967
───────────────────────────────────────────────────────────
Diagnostics:
air_perf = model_perf(model_air);Normality:
resid_plot(air_perf)Variance:
rvf_plot(air_perf)air_des = Dict(:Month => unique(air.Month))
air_eff = effects(air_des, model_air, invlink=exp)| Row | Month | log_oz | err | lower | upper |
|---|---|---|---|---|---|
| String | Float64 | Float64 | Float64 | Float64 | |
| 1 | May | 16.8862 | 2.66266 | 14.2235 | 19.5488 |
| 2 | Jun | 25.4489 | 6.55297 | 18.8959 | 32.0019 |
| 3 | Jul | 48.6103 | 7.3643 | 41.2459 | 55.9746 |
| 4 | Aug | 45.6385 | 7.35119 | 38.2873 | 52.9896 |
| 5 | Sep | 24.998 | 3.58588 | 21.4121 | 28.5839 |
effect_plot(
air_eff,
:Month, :log_oz,
xlab = "Month",
ylab = "Ozone (ppb)"
)4.5.2 Kruskal-Wallis test
We are going to look at a RCT about treatment of children suffering from frequent and severe migraine.
fent = rcopy(R"pubh::Fentress")
fent |> schema┌───────┬───────────────┬──────────────────────────────────┐
│ names │ scitypes │ types │
├───────┼───────────────┼──────────────────────────────────┤
│ pain │ Continuous │ Float64 │
│ group │ Multiclass{3} │ CategoricalValue{String, UInt32} │
└───────┴───────────────┴──────────────────────────────────┘
Calculate statistics of pain by group from the fent dataset.
Code
pubh.estat(@formula(pain ~ group), data=fent) |> rcopy| Row | group | N | Min | Max | Mean | Median | SD | CV | |
|---|---|---|---|---|---|---|---|---|---|
| String | String | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
| 1 | pain | Untreated | 6.0 | -288.0 | 100.0 | -55.0 | -36.0 | 139.63 | -2.54 |
| 2 | Relaxation | 6.0 | 43.0 | 100.0 | 84.0 | 96.0 | 23.3 | 0.28 | |
| 3 | Biofeedback | 6.0 | 37.0 | 91.0 | 70.67 | 74.0 | 19.39 | 0.27 |
Compare the mucociliary efficiency between groups with a strip chart.
fent_bst = pubh.gen_bst_df(
@formula(pain ~ group),
data=fent
) |> rcopy| Row | pain | LowerCI | UpperCI | group |
|---|---|---|---|---|
| Float64 | Float64 | Float64 | Cat… | |
| 1 | -55.0 | -162.78 | 36.0 | Untreated |
| 2 | 84.0 | 65.16 | 98.67 | Relaxation |
| 3 | 70.67 | 55.5 | 82.83 | Biofeedback |
Code
strip_error(
fent,
:group, :pain,
xlab = "Cohort",
ylab = "Pain reduction"
)What is your main concern regarding your descriptive analysis?
Dispersion of pain reduction is greater in the Untreated group than in the other two groups. The sample size is relatively small for the control limit theorem to compensate.
We are going to perform the non-parametric, Kruskal-Wallis test, to test if the differences in pain reduction between groups are statistically significant or not.
KruskalWallisTest(
vec_group(fent, :pain, :group)...
) |> pvalue |> r30.057
We did not find a significant difference between pain reduction in the untreated group and treatment groups (either relaxation or with biofeedback) (\(p=0.057\), Kruskal-Wallis test).